10. Attention Encoder & Decoder

In machine translation applications, the encoder and decoder are typically

SOLUTION: Recurrent Neural Networks (Typically vanilla RNN, LSTM, or GRU)

Word Embeddings

What's a more reasonable embedding size for a real-world application?

SOLUTION: 200

What are the steps that require calculating an attention vector in a seq2seq model with attention?

SOLUTION: Every time step in the decoder only